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Abstract 

We consider finite-state time-nonhomogeneous Markov chains whose transition ma- 
trix at time n is I + G/n^ where G is a "generator" matrix, that is G{i,j) > for 
i,j distinct, and G{i, i) = — Ylk^i ^ih k), and C > is a strength parameter. In these 
chains, as time grows, the positions are less and less likely to change, and so form sim- 
ple models of age-dependent time-reinforcing schemes. These chains, however, exhibit 
some different, perhaps unexpected, occupation behaviors depending on parameters. 

Although it is shown, on the one hand, that the position at time n converges to a 
point-mixture for all C > 0, on the other hand, the average occupation vector up to time 
n, when variously < C < C > 1 or ^ = 1, is seen to converge to a constant, a point- 
mixture, or a distribution fiQ with no atoms and full support on a simplex respectively, 
as n I oo. This last type of limit can be interpreted as a sort of "spreading" between 
the cases < C < 1 and C > 1- 

In particular, when G is appropriately chosen, intriguingly, /ifj is a Dirichlet distri- 
bution, reminiscent of results in Polya urns. 
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1 Introduction and Results 

In this article, we study laws of large numbers (LLN) for a class of finite space time- 
nonhomogeneous Markov chains where, as time increases, positions are less likely to 
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change. Although these chains feature simple age-dependent time-reinforcing dynam- 
ics, some different, perhaps unexpected, LLN occupation behaviors emerge depending 
on parameters. A specific case, as in Example 1.1, was first introduced in Gantert [8] 
in connection with analysis of certain simulated annealing LLN phenomena. 

Example 1.1 Suppose there are only two states 1 and 2, and that the chain moves 
between the two locations in the following way: At large times n, the chain switches 
places with probability c/n, and stays put with complementary probability 1 — c/n for 
c > 0. The chain, as it ages, is less inclined to leave its spot, but nonetheless switches 
infinitely often. One can see the probability of being in state 1 tends to 1/2 regardless of 
the initial distribution. One may ask, however, how the average location, or frequency, 
of state 1 behaves asymptotically. For this example, it was shown in [8] and Ex. 
7.1.1. [2S], perhaps surprisingly, that any LLN limit could not be a constant, or even 
converge in probability, without further identification. However, a quick consequence 
of our results is that the average occupation limit of state 1 converges weakly to the 
Beta(c, c) distribution (Theorem II. 4p . 

More specifically, we consider a general version of this scheme with m > 2 possible 
locations, and moving and staying probabilities G{i,j)/n'' and 1 — X^^^j G{i, k)/n^ from 
i ^ j ^ i and i ^ i respectively at time n where G = {G{i,j)} is an m x m matrix 
and C > is a strength parameter. After observing the location probabilities tend to a 
distribution which depends on G, and initial probability vr when C > 1; but does not 
depend on ^ and vr when C ^ 1 (Theorem II. ip . the results on the average occupation 
vector limit separate roughly into three cases depending on whether 0<C<1)C = 1) 
or C > 1- 

When < C < 1) following [8], the average occupation is seen to converge to a 
constant in probability; and when more specifically < ^ < 1/2, this convergence 
is proved to be a.s. When > 1, as there are only a finite number of switches, the 
position eventually stabilizes and the average occupation converges to a mixture of 
point masses (Theorem II. 2p . 

Our main results are when ^ = 1. In this case, we show the average occupation 
converges to a non-atomic distribution //g, with full support on a simplex, identified 
by its moments (Theorems II. 31 and II. 5p . When, in particular, G takes form G{i,j) = 9j 
for all i ^ j, that is when the transititions into a state j are constant, takes the 
form of a Dirichlet distribution with parameters {9j} (Theorem II. 4p . The proofs of 
these statements follow by the method of moments, and some surgeries of the paths. 

The heuristic is that when < C < 1 the chance of switching is strong and sufficient 
mixing leads to constant limits, but when C > 1 there is little movement giving point- 
mixture limits. The case C = 1 is the intermediate "spreading" situation leading to 
non-atomic limits. For example, with respect to Ex. 1.1, when the switching probability 
at time n is c/n^, the Beta(c, c) limit when C = 1 interpolates, as c varies on (0, oo), 
between the point-mass at 1/2, the frequency limit of state 1 when < C < 1) and the 
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Figure 1: Beta(c, c) occupation law of state 1 in Ex. 1.1. 



fair mixture of point-masses at and 1, the limit when C > 1 and starting at random 
(cf. Fig. 1). 

In the literature, there are only a few results on LLN's for time-nonhomogeneous 
Markov chains, often related to simulated annealing and Metropolis algorithms which 
can be viewed in terms of a generalized model where ^ = Cihj) is ^ non-negative 
function. These results relate to the case "maxC(i,j) < 1" when the LLN limit is a 
constant [8] , Ch. 7 [28] , [9] . See also Ch. 1 [l6] , [11] , ; and texts [6] , [H] , [E] for more 
on nonhomogeneous Markov chains. In this light, the non-degenerate limits fie found 
here seem to be novel objects. In terms of simulated annealing, these limits suggest a 
more complicated LLN picture at the "critical" cooling schedule when Cihj) = 1 for 
some pairs i,j in the state space. 

The advent of Dirichlet limits, when G is chosen appropriately, seems of particular 
interest, given similar results for limit color- frequencies in Polya urns [3], [10], as it hints 
at an even larger role for Dirichlet measures in related but different "reinforcement" - 
type models (see |17j . |23j . [22], and references therein, for more on urn and reinforce- 
ment schemes). In this context, the set of "spreading" limits fic in Theorem 11.31 in 
which Dirichlet measures are but a subset, appears intriguing as well (cf. Remarks 1.4, 
1.5 and Fig. 2). 

In another vein, although different, Ex. 1.1 seems not so far from the case of 
independent Bernoulli trials with success probability 1/n at the nth trial. For such 
trials much is known about the spacings between successes, and connections to GEM 
random allocation models and Poisson-Dirichlet measures [27] . [1], [2], [3], [23], [25]. 

We also mention, in a different, neighbor setting, some interesting but distinct 
LLN's have been shown for arrays of time-homogeneous Markov sequences where the 
transition matrix P„ for the nth row converges to a limit matrix P ^llj, Section 
5.3 [15j : see also |21] which comments on some "met ast ability" concerns. 
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We now develop some notation to state results. Let S = {1, 2, . . . , m} be a finite 
set of m > 2 points. We say a matrix M = {M{i,j) 1 < i, j < m} on E is a generator 
matrix if M{i,j) > for all distinct 1 < i,j < m, and M{i,i) = —'^j^iM{i,j) for 
1 < i < m. In particular, M is a generator with nonzero entries if M{i,j) > for 
^ i,j ^ m distinct, and M{i, i) < for 1 < i < m. 

To avoid technicalities, e.g. with reducibility, we work with the following matrices, 

G G : G is a generator matrix with nonzero entries j-, 

although extensions should be possible for a larger class. For G G G, let n{G, Q = 
[maxi<j<^ |G(i,z)|-'^/''], and define for > 

pG,c ^ / I for l<n<n(G,C) 

\ I + G/n< forn>n(G,0 + l 

where / is the mxm identity matrix. Then, for all n > 1, Pn is ensured to be a 
stochastic matrix. 

Let TT be a distribution on E, and let P^r' be the (nonhomogeneous) Markov measure 
on the sequence space with Borel sets corresponding to initial distribution 

IT and transition kernels {Pn }. That is, with respect to the coordinate process, 
X = {Xq,Xi, . . .), we have {Xq = i) = TT{i) and the Markov property 

P^'^(X„+i = j\Xo,Xi, Xn-l,Xn = i)= P^^iii, j) 

G C 

for alHjj G S and n > 0. Our convention then is that P^^i controls "transitions" 

G ^ G r 

between times n and n+1. Let also be expectation with respect to Ptt . More 

generally, denotes expectation with respect to measure fi. 

Define the occupation statistic Z„ = {Zi^n, ■ ■ , Zm,n) for n > 1 where 

1 " 

k=l 

for 1 < i < m. Then, Z„ is an element of the m — 1-dimensional simplex, 

= < X : Xi = l,0<a;i<lforl<z<m>. 
i=i ^ 



The first result is on convergence of the position of the process. For G G G, let 

be the stationary distribution corresponding to G (of the associated continuous time 
homogeneous Markov chain), that is the unique left eigenvector, with positive entries, 
normalized to unit sum, of the eigenvalue 0. 
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Theorem 1.1 For G G <G, > 0, and initial distribution tt, under Ptt , 

where fG,7r,c ^-^ probability vector on S depending in general on C, ancJ tt. When 
< ^ < 1, fG,7r,c ^'^^^ depend on tt anc? anc? reduces to fG,7r,c = ^G- 

Remark 1.1 For > 1, with only finitely many moves, the convergence is a.s., 

and fG,7r,c is explicit when G = VgDgVq^ is diagonalizable with Dq diagonal and 
Dciiji) = Ap, the zth eigenvalue of G, for I < i < m. By calculation, fG,7r,c = 
^* nn>i Pn'^ = ^'VcD'Va' With D' diagonal and z) = n„>no(G,C)+i(l + 

We now consider the cases C 7^ 1 with respect to average occupation limits. Let i 
be the basis vector i = (0, . . . , 0, 1, 0, . . . , 0) G with a 1 in the ith component and 
5i be the point mass at i for 1 < i < m. 

G C 

Theorem 1.2 Let G EG, and vr be an initial distribution. Under F.j^'^ , we have that 

Zn — i^G 

converges in probability when < ( < 1; when more specifically < ^ < 1/2, this 
convergence is ¥^''^-a.s. 

G C 

However, when C > 1; under F-,^ , 

m 

Zn ^ ^l^G,7v,d^Si . 

1=1 

Remark 1.2 Simulations suggest that actually a.s. convergence might hold also on 
the range 1/2 < ( < 1 (with worse convergence rates as ^ j 1). 

Let now 71, ... ,7m > 0, be integers such that 7 = YliLili ^ 1- Define the list 
A = {ai : 1 < i < ^} = { 1, . , y^^-- >2 , . . . , rn,..^.,mj . Let §(71, . . .,7m) be the 7! 

71 72 _7m 

permutations of A, although there are only (^^ _ ^ ) distinct permutations; that is, 
each permutation appears HfeLi 7fe' times. 

Note also, for G E G, being a generator matrix, all eigenvalues of G have non- 
positive real parts (indeed, 1 + G/k is a stochastic matrix for k large; then, by Perron- 
Probenius, the real parts of its eigenvalues satisfy —1 < l + Re{Xf)/k < 1, yielding the 
non-positivity), and so the resolvent {xl — G)~^ is well defined for x > 1. 

G C 

Theorem 1.3 For ^ = 1, G £ G, and initial distribution tt, we have under Ptt that 

Zn MG 
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where fic is a measure on the simplex characterized by its moments: For 1 < i < m, 

It — '■LXJ 

and for integers 71 , 



E^^{xi) = lim ^^'^{Zi^r. 

n—*oo 

5 7m > when 7 > 2, 



X 



71 



lim E 

n— >oo 



v7i . . . V7m 

l,n ^m,n 



(tGS(7i,...,7„) j=l ^ ^ 



((Tj,CJj+l). 



Remark 1.3 However, as in Ex. 1.1 and [8], when = 1 as above, Z„ cannot converge 
in probability (as the tail field n„(T{X„, X^+i, . . .} is trivial by Theorem 1.2.13 and 
Proposition 1.2.4 [T6] and (j2.3p . but the limit distribution is not a point-mass by 
say Theorem 11.51 below) . This is in contrast to Polya urns where the color frequencies 
converge a.s. 

We now consider a particular matrix under which jiQ is a Dirichlet distribution. 
For ^1, . . . , > 0, define 

61 — 9 02 _ 63 ■ ■ ■ 9m 
61 62 — 63 • • • 9m 



e 



^3 



where 9 = X]/=i ^i- is clear G G. Recall identification of the Dirichlet distribution 
by its density and moments; see |18j . |26j for more on these distributions. Namely, the 
Dirichlet distribution on the simplex with parameters 9i, ... ,9m (abbreviated as 
Dir(6'i, . . . ,9m)) has density 



m)---r(em) ' " ■ 

The moments with respect to integers 71 , . . . , 7^ > with 7 > 1 are 



E{ x1' 



uT=im + i) 



+ 7i-l) 



0. 



:i.i) 



where we take 9i{9i + 1) • • • (0j + 7, — 1) = 1 when % 
Theorem 1.4 We have fiQ = Dir(0i, . . . ,9m)- 

Remark 1.4 Moreover, by comparing the first few moments in Theorem 11.31 with 
p.ip . one can check fj,Q is not a Dirichlet measure for many G's with m > 3. However, 
when m = 2, then any G takes the form of G with 9i = G{2, 1) and O2 = G{1, 2), and 
so //G = Dir(G(2,l),G(l,2)). 
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Figure 2: Empirical fic densities under Gieft and Gright respectively. 



We now characterize the measures {^g ■ G E G} as "spreading" measures different 
from the limits when < C < 1 and C > 1- 

Theorem 1.5 Let G £G. Then, (1) nciU) > for any non-empty open set U C A^. 
Also, (2) fj,G has no atoms. 



Remark 1.5 We suspect better estimates in the proof of Theorem 11.51 will show fic 
is in fact mutually absolutely continuous with respect to Lebesgue measure on A^. Of 
course, in this case, it would be of interest to find the density of fic- Meanwhile, we 
give two histograms, found by calculating 1000 averages, each on a run of time-length 
10000 starting at random on S at time n{G, 1) (= 3, 1 respectively), in Figure 2 of the 
empirical density when m = 3 and G takes forms 



G 



left 



1 2 

-3 1 

2 -3 



and Gright 



-.4 
.3 
.5 



.2 

-.6 
.5 



.2 
.3 

-1 



To help visualize plots, A3 is mapped to the plane by linear transformation /(x) = 
0, 0)) + X2/((0, 1,0))+ X3/((0, 0, 1)) where /((1, 0, 0)) = {V2, 0), /((0, 1, 0)) = 
(0,0) and /(0,0,1) = ^/2(l/2, ^/3/2). The map maintains a distance \/2 between the 
transformed vertices. 



We now comment on the plan of the paper. The proofs of Theorems 11.11 and II. 2^ 
11.31 11-41 and ll.51 (1) and (2) are in sections 2,3,4, 5, and 6 respectively. These sections 
do not depend structurally on each other. 
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2 Proofs of Theorems 11.11 and 11.2 



We first recall some results for nonhomogeneous Markov chains in the literature. For 
a stochastic matrix P on S, define the "contraction coefficient" 

c{P) = max — 

x,y 2 ^-^ 



P{x,z)-P{y,z) 
1 - min^min|p(x,z),P(y,z)| (2.1) 



The following is, for instance, Theorem 4.5.1 |28j . 

Proposition 2.1 Let Xn be a time-nonhomogeneous Markov chain on S connected by 
transition matrices {Pn} with corresponding stationary distributions {i^n}- Suppose 
oo oo 

c(P„) = and ^ - i/„+i||var < oo. (2.2) 

n=l ra=l 

Then, u = lim^^oo i^n exists, and, starting from any initial distribution n, we have for 
each k €z T, that 

lim P{Xrr = k) = u{k). 

n— >oo 

The following is stated in Section 2 [8] as a consequence of results (1.2.22) and 
Theorem 1.2.23 in 116]. 



Proposition 2.2 Given the setting of Proposition \2J\ suppose \2.2^) is satisfied, and 
Cn = max„o<,<„ c{Pi) < 1 for all n > uq for some uq > 1. Let vr and f be any initial 
distribution, and function / : S ^ M. Then, we have convergence 



1 

-^/(X,) ^ EM] 



n . 



in the following senses: 

(i) In probability, when lim„^oo n(l - Cn) — OO. 
(a) a.s. when ^n>no 2~"(1 - C2")"^ < oo. 

Proof of Theorem We first consider when > 1. In this case there are only 
a finite number of movements by Borel-Cantelli since X^„>i IP-tt (-''^n 7^ ^n+i) < 
^ Y^n>i^'^ < cxD. Hence there is a time of last movement N < oo a.s. Then, 
limXn = Xn a.s., and, for /c G S, the limit distribution i^G,nX is defined and given by 
F^'^{Xn = k)= i^G,n,dk)- 

When < C < 1, as G G G, by calculation with 1^^, c{Pn''^) = 1 - Cc/n^ for all 
n > no(G', C) large enough and a constant Cq > 0. Then, 



n>l n>no{G,Q 
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Since for n > n{G,C,), u^Pn''^ = i^q{I — G/n'^) = Uq, the second condition of Proposi- 
tion [2?T] is trivially satisfied, and hence the result follows. □ 

Proof of Theorem M.SX When C > 1, as mentioned in the proof of Theorem ll.H there 
are only a finite number of moves a.s., and so a.s. lim Z„ = Yl^=i l[Xjv=fc]^ concentrates 
on basis vectors {k}. Hence, as defined in proof of Theorem II. H P^'^(X7v = k) = 
i^G,n,cik): and the result follows. 

When < C < 1) we apply Proposition 12.21 and follow the method in [8j. First, 
as in the proof of Theorem ll.H l\2.2\\ holds, and c{Pn'^) = 1 — Cc/n'^ for a constant 
Cc > and ah n > no(G,C). Then, c„ = max„g(G',^)<i<„ c(P. ''-) = 1 - Cg/n^ < 1. 
Now, n(l — Cn) = CGf^~'^ I oo to give the probability convergence in part (i). For a.s. 
convergence in part (ii) when < ^ < 1/2, note 



E 2n(i _ C2.)2 E 2-(Cg/(2-)C)2 ^ C2 (2i-2C)r 

3 Proof of Theorem II. 3L 



< oo. □ 



In this section, as = 1 is fixed, we suppress notational dependence on Q. Also, as Z„ 
takes values on the compact set A^, the weak convergence in Theorem 11.31 follows by 
convergence of the moments. 

The next lemma establishes convergence of the first moments. 

Lemma 3.1 For G G G, 1 < k < m, and initial distribution n, 

lim E^(Zfc,„ ) = ucik) 
Proof. From Theorem 1 1.1) and Cesaro convergence, 
limE^fzfc,^') = limi VE^flfc(Xo') = lim i V P^(X, = fc) = i^cik). □ 

^ ^ j=l ^ i=l 

We now turn to the joint moment limits in several steps, and will assume in the 
following that 71, ... , 7^ > with 7 > 2. The first step is an "ordering of terms." 

Lemma 3.2 For G G G, and initial distribution ir, we have 



lim 

n— >oo 



E^ Z71,...Z>;„ 



71-7+1 n— 7+2 n ^7 ^ 

(Te§(7i,...,7m) *1 = 1 «2>n i^>i^^i ^1=1 



0. 
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Proof. By definition of 8(71, . . . , 7^), 



CT6S(7li---i7m) 
l<il ,...,i^<n 



Note now 



1 = 7!n''', and ^ l = 7!7!('^j 



CTeS{7i,...,7m) 
l<i;^,...,i;y<n 



CTeS(7i,...,7TO) 

l<ij ,...,i;y<n, distinct 



Let fC be tfiose indices {ii, . . . 1 < ii, . . . , i^y < n which are not distinct, that is 



ik for some j k. Then, 



1 1 

7! rCi 



t6S(7j,...,7,7i) 

(ix,...,i^)e/c 



(TeS(7i,...,7^) ^ / = 1 / o-eS{71,---,7m) ^ / = 1 

l<i]^,...,i;y<n l<ij^,...,i;y<n, distinct 

= oil). 



1 1 _ 
< — 7— ( 7!n''' — 717! 
~ 7!nT' ' ' ' 



But, 



71, ■■■,7m) ^i=l ^ CTeS(7i,...,7m) ^ /=1 



ctSS(7i 
l<i]^,...,i^<n, distinct 



CTeS(7i,...,7m) 



□ 



The next lemma replaces the initial measure with vq. Let P^j = Y^i^^Pi^ for 
1 < ^ < i- 



Lemma 3.3 For G E G and initial distribution tt, we have 

n— 7+1 n— 7+2 n / 7 



lim 

n— »oo 



<TGS(7i,.--,7m) n=l i2>n j^>iT}_i 



(3.1) 



E 



t'G(<7l) 



n-7+1 ■»,-7+2 



Tl 7— 1 



E E ••• E n^f+Mm(^''^'+i) 



o-GS(7i,...,7m) 

Proof. As P^(Xj = = s) = P^^i ,j{s,t) for 1 < i < j and s,i G E, we have 

n— 7+1 n— 7+2 n ^7 

E E E - E E?(ni..(A-.,; 



(TeS(7i,...,7„ 



ii=l ?2>ji ii^>i*^_i ^ J=l 
n— 7+1 n— 7+2 n 



n— 7+1 n— 7+2 n 7— 1 

E i E E ••• E ^^?(^n = -i)n^W.(-"-'+i) 



creS(7l,...,7m) n=l i2>*l i^>i^p_i 
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which differs from the second expression in (j3.ip by at most 

^ ra-7+1 

^ n ^ 

crGS(7l,.--,7m) *1=1 

which vanishes by Theorem II .![ 

We now focus on a useful class of diagonalizable matrices 



□ 



: Re(Ap) < 1 for 1 < / < m, and G is diagonalizable ^ 



where {Ap} are the eigenvalues of G. As Re(Ap) < for 1 < I < m when G G G, 
certainly all diagonalizable G G G belong to G*. The relevance of this class, in the 
subsequent arguments, is that for G £ G* the resolvent {xl — G)~^ exists for x > 1. 

For G G G*, let Vg be the matrix of eigenvectors and Dq be a diagonal matrix with 
corresponding eigenvalue entries DG{i,i) = Xf so that G = VgDgVq^ ■ Define also for 
1 < s,t,k < m, 

g{k;s,t) = VG{s,k)VQ\k,t). 
We also denote for ai,... ,am G C, the diagonal matrix Diag(a.) with ith diagonal 



entry Oj for 1 < i < m. We also extend the definitions of P„ and Pf'j to G G G* 
with the same formulas. In the following, we use the principal value of the complex 
logarithm, and the usual convention a^~^^^ = gC'+^c) log(a) £qj. G M with a > 0. 

Lemma 3.4 For G G G*, s,t £ S, and C < i < j where C = C{G) is a large enough 
constant, 



= Y.i^{k;i,j)gik;s,t) 



J 



k 



k=l 

moreover, u{k;i,j) — > 1 as i ] 00 uniformly over k and j. 
Proof. Straightforwardly, 

k — Z k — 2 

To expand further, we note for z G C such that |z — 1| < 1, we have 



log(2;) 



00 ^ 



and estimate 



n+l 



n=0 



n + 2 



(z-ir 



n=0 



00 



n=0 



\z - 1 



-1 
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k 



n(i+ 

k=i ^ k=i 

where c{s;i,j) = Y^u=A>^^ /kf'Cs,k satisfies 



exp(^^log(^l + ^jj = exp(^^^ + c(s;z,i) 



Let now L be so large such that maxi<u<^ \\'^\/L < 1/2. Then, for 1 < s < m 
k>L, 

for some Cg^k € C with \Cs,k\ ^ (1 ~ rtiaxi<„<m |A^|/L)-i < 2. Then, for ? > L, 
k=iV^s /^f'Cs,k satisfies 

oo ^ 

\c{s\i,j)\ < 2 max |A^p ^2 ^ uniformly over s and j as i j oo 

fe=i 

Let now 

^(-■)^Af(i:i-£f) 

and note by the simple estimate 



^ < ^^^^ 



that 



uniformly over j and s as i t 00. This allows us to write 



k=i 



n(l + -f- ) = exp(c(s;i,i) + o?(s;i,i) 



i-1 



X? 



Defining v{s;i,j) = exp(c(s;i,j) + d{s]i, j)) gives after multiplying out that 



VGBmg(i^i-,i,j)(-^] ]Va' 



i - 1 



.fc=i 



s,tes 



completing the proof. 

To continue, define for G G G* the function T^^y{s,t) : (0, if x ^ C by 



fc=i 
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Lemma 3.5 For G G G, 

[nej n-7+2 n 7-I 



o"GS(7i ,...,7m) ji=l «2>«i i^>i^_i /=1 



Proo/. For any a G §(71, • • • , 7m), 

^ [nej ri-7+2 n 7-I 

< limlim — (ne)n^~^ = 0. □ 

e n nT 

Lemma 3.6 For G G G*, a G §(71, • • • ,7m), a^ic^ e > 0, 

n— 7+1 n— 7+2 n 7— 1 

it;!? E E - E Y[^..u,j^.^.») 

ii = [nej+l i2>n i7>iT}_i 1=1 



/ n '^xi,xi+i(^i^^i+'^) dxidx2 ■■■dx^ 



Proof. From Lemma [3. 4| as z^(s; i, j) — > 1 as i t oo uniformly over j and s, Tx^y{s, t) 
is bounded, continuous on [e, 1]^ for fixed s,t, and Riemann convergence, we have 



71—7+1 n— 7+2 n 7— 1 



E E ■■■ E n^'f+iA+i^^^'^'+i^ 



lim ■ _ 

n rC 

ii = [nej+l i2>ii ij>ij-i 1=1 

^ 71-7+1 71 7-1 m y ^^1^ ^ -A^^ 

jl = [7iej+l j^>i^-i Z=l fc=l ^ ' 



/ n '^xuxi+M^^ ai+i)dxidx2 ■ ■ ■ dx^. 

J e<x\<xo<---<x^<l 7 1 



□ 



I t<.Xl<X2<---<X^<l 

Lemma 3.7 For G G G* and o" G §(71, • • • , 7m), 

7-1 



lim / T^.^^^^(cri,(T/+i) dxidx2---dx^ 

J e<XT <xo<---<x^<l J 1 



"1 /"X^. /"a;2 



■■■ T^^_r,xA(^i-i^cri) ■ ■ ■T^^,x2i'^i^^2)dxidx2 - ■ ■ dx^. 
Jo 
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Proof. Let 

7-1 

fe = ^{e<xi<X2<---<x^<l}Y\. '^xi,xi+ii'^h^l+l)- 

1=1 

Then, 

7-1 

lim/e = l{0<a;i<a;2<-<a:^<l} n ("^^ Cr«+l)) 

1=1 

and fe is uniformly bounded over e as 

I/el < / = l{0<xi<X2< - <x,<l} nZ]l^(^''^''^'+l)l 

The right-hand bound is integrable: Indeed, by Tonelh's Lemma and induction, we 
have 

J fdxi---dx^ = J J ■■■J '\J_^^\9{k]ai,ai+i)\i^——j dxi---dx^ 

|c/(A:;cri,cri+i)| 



.HE 



Hence, the lemma follows by dominated convergence and Fubini's Theorem. □ 
Lemma 3.8 For G e G* and a G §(71, . . . , jm), 

/ / ■■■/ J\^xi,xi+Mu(^i+i)dxi---dx^ = ^n(^-^"^) 



1=1 ' 1=1 

Proof. By induction, the integral equals 

"1 fXy fX2 

7— 1 / m 



i-l rx^ rX2 

/ / ••• / T^^.^,x^i'^-f-i^'^y)---T^i,x2i'^i^(^2)dxi---dx^ 
Jo Jo Jo 



However, for x > 1, we have 

-1 / \ -1 



{s,t) = Vg(^xI-Dg^ Vc\s,t) = 



g{k]s,t) 



k=i ^ - 

to finish the identification. □ 

At this point, by straightforwardly combining the previous lemmas, we have proved 
Theorem II. 2 1 for G £ G diagonalizable. The method in extending to non-diagonalizable 
generators is accomplished by approximating with suitable "lower" and "upper" diag- 
onal matrices. 
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Lemma 3.9 For G eG, 

, N n— 7+1 n— 7+2 n 7— 1 

crGS(7i,...,7m) «i=l i2>n i^>i^-i 1=1 



1 ^"V 

r 5] r/G(^i)n ^^-^ (^^^^^^+1)- (3.2) 



(Te§(7i,...,7m) '=1 

Proof. For an m x m matrix ^, let G[A] = G + A. Let || • ||m be the matrix norm 
II^IIm = max{|j4(s, t)| : 1 < s,t < m}. Now, for small e > 0, choose matrices Ai and 
^2 with non-negative entries so that ||Ai||m, ||^2||m < I+G[—Ai]/l, I+G[A2]/l have 
positive entries for all I large enough, and G[— j4i], ^[^2] G G*: This last condition can 
be met as (1) the spectrum varies continuously with respect to the matrix norm || • ||m 
(cf. Appendix D [l3]), and (2) diagonalizable real matrices are dense (cf. Theorem 1 

m)- 

Then, for s,t G S, and / large enough, we have < (/ + G[—Ai]/l){s,t) < (/ + 
G/l){s,t) < (/ + G[A2]/l){s,t). Hence, for i < j with i large enough. 

By Lemmas 13. 5| 13. 6| 13.71 and 13.81 t^is left-side of ()3.2p , that is in terms of liminf and 
limsup, is bounded below and above by 



1 ^"V 

^^^G(<Tl)^ l^^-CMi] J (^^^/+i)> 



and 

E ^^g{cji)^[U-G{A2\\ {ai,ai+i) 

aeS(7iv,7m) '=1 ^ ^ 

respectively. On the other hand, for a £ §(71, . . . , 7m.) , both 

7— 1 7— 1 7~1 

Jl (// _ G[-A^])-\ai, ai+i), [] " G[A2]r\ai, ai+^) ^ HiU - G)-\ai, ai+^) 
1=1 1=1 1=1 

as e — > 0, completing the proof. □ 



4 Proof of Theorem 1.4 



The proof follows by evaluating the moment expressions in Theorem 11.21 when G = Q 
as those corresponding to the Dirichlet distribution with parameters 6i, ... ,9m (jl.ip . 
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Lemma 4.1 The stationary distribution vq is given by i^e(0 — ^ll^ f^'^ ^ ^ ^• 
Also, for 2 < I < J, let Fi be the m x m matrix with entries 

9k for j 
9j + l-l for = j. 



Fi{j,k) 
Then, 



Proof. The form of i^e foHows by inspection. For the second statement, write -Fj+i = 
II+Q where the matrix Q has ith column equal to Oi{l, . . . , 1)*. Then, also = Q — 9L 
As (1, ... , 1)* is an eigenvector of with eigenvalue 0, we see (Z/— 0)(Z/+G) = Q?+16)I 
finishing the proof. □ 

The next statement is an immediate corollary of Theorem 11.31 and Lemma 14. li 

Lemma 4.2 The ^@-moments satisfy E^^ [xi\ = 9i/9 for 1 < i < m and, when 7 > 2, 

7-1 / \ -1 



i=l <76§{7i,...,7„0 ^ 1=1 ^ ^ 



E ;r,n-r7-i/ 



We now evaluate the last expression of Lemma 14.21 by first specifying of the value 
of cj^. Recall, by convention 0; • • • (0; + 7^ — 1) = 1 when 7; = for \ <l <m. 

Lemma 4.3 For 7 > 2 and 1 < k < m, 

7 m 

E ^a,YlFi{ai-i,ai) = j^{j-l)[Y[er--{ei + ji-l). (4.1) 

<tGS{7i,...,7„i) 1 = 2 1 = 1 

Proof. The proof will be by induction on 7. 

Base Step: 7 = 2. If 7^ = 1 and 7^ = 1 for i 7^ k, the left and right-sides of (14.1jl 
both equal 9iF2{i,k) = 9i9k- If 7fc = 2, then the left and right-sides of (|4.1|) equal 
29kF2{Kk) = 29k{9k + l). 

Induction Step. Without loss of generality and to ease notation, let k = 1. Then, by 
specifying the next-to-last element cr^-i, and simple counting, we have 

7 7-1 
0„,\{Fi{auai_{) = ^^{9i+^-l) ^ 9^,WFi{auai_^) 

<TSS(7i,...,7m) 1 = 2 (T6S{7i-l,...,7m) 1 = 2 

o-^ — 1 o-^_2^ — 1 

m 7—1 

+ Y.^iOi Yl 0„,\{Fi{ai,ai.^). 

j = 2 <TSS(7i-l,...,7m) 1=2 

o"7-l=i 
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We now use induction to evaluate the right-side above as 

m 

^i---(0i+7i-2)n^*---(^* + 7*-l) 

i=2 

x|7i(0i + 7 - l)(7i - 1)(7 - 2)! + ^71017^.(7 - 2)!| 



i=2 



1=2 

X <! 7i(^i + 7 - l)(7i - 1)(7 - 2)! + 71^1(7 - 7i)(7 " 2)! 



1=2 

X7i(7-2)!|(0i+7i-l)(7-l) 

m 

7i(7-l)!j]0r--(^/ + 7/-l)- □ 



1=1 



By now adding over 1 < /c < m in the previous lemma, we finish the proof of 
Theorem 11.41 



Lemma 4.4 When 7 > 2, 

nL-^/(^^-i^^O ^ ni^i^r--(g^ + 7^-l) 

Proof. 



Tr7-1 1'Z) i r\ / > / ^ 



Er=i7fc(7-l)!n[ligr--(g^ + 7^-l) 

7! nz=o(^"+o 

ni:igr--(g^ + 7/-l) 

nz:o'(^"+o 



□ 



5 Proof of Theorem 11.51 (1) 



Let p = (pi, . . . ,Pm) G IntAm be a point in the simplex with > for \ < i < m. 
For e > small, let i?(p,e) C IntAm be a ball with radius e and center p. To prove 
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Theorem 11.51 (1), it is enough to show for all large n the lower bound 

P^(^Z„Gi?(p,e)^ > C(p,e) > 0. 

To this end, let Po = and pi = YTi=iPi for 1 < i < m. Also, define, for 1 < A; < /, 
= {Xk, . . . ,Xi). Then, there exist small 5, /? > such that 



{Z„GB(p,e)} 



(5.1) 



^ ^Q<ku...,k^<\nl3\\\^^aS\ 



[npij-fci 



where ka = Yld=i a-^d z is a vector with all coordinates equal to i of the appropriate 
length. The last event represents the process being in the fixed location j for times 
[npj_i\ — fej-i + 1 to [npj\ — kj for 1 < j < m where we take 1 — k^ = [n5\ . 

Now, as G has strictly negative diagonal entries, Ci = maxg |G(s,s)| > 0, and so 
for all large n, 



n i-^>- 



J 



j=[n5] 

Also, as G has positive nondiagonal entries, C2 = min<j G{s, s + 1) > 0. Then, 

C2 



[np. 



ki—i + 1 



Hence, for all large n, as ^^{X^^s\ = 1) > i^g(1)/2 (Theorem [T!]), 
P^fz,Gi?(p,e)' 



> 



> 



> 



> 



0<fci,..,fc™^<[n/3J 

5^^ 



E 



"0(1) 



n 



c. 



2 I npi^-i I — A;,-_i + I 



2 



E 



^^g(I) 



n 



Co 




2 I npi-i I — A;,-_i + I 

0<fei,...,A:„<Ln/3J j=2 L J ^ ' 

^\lnpj^i\-lnP\ 



i=2 



□ 
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6 Proof of Theorem 11.51 (2) 



The proof of Theorem 11.51 (2) fohows from the next two propositions. 

Proposition 6.1 For G £ G, the m vertices of A^, 1, • • • , m, are not atoms. 

Proof. From Theorem 11.31 moments a^^fc = E^^[{xi)^] satisfy a;,fc+i = (/ - 
G/k)-^{l,l)ai^k for 1 < / < m and /c > 1. By the inverse adjoint formula, for large k, 



1 -TV(G)/A; 

As G G G, G{l,l) < 0. Hence, ai^k vanishes at polynomial rate ai^k ^ A;'^^'''-'. In 
particular, as ficiW) ^ ^fioii-^^)^] — > as /c ^ oo, the point 1 cannot be an atom of 
the limit distribution. □ 

Fix for the remainder p G \ {1, . . . , m}, and define p = minjpj '■ Pi > 0,1 < i < 
m} > 0. Let also < 6 < p/2, and consider -B(p, 5) = {x G Am : |p — x| < 6}. 

Proposition 6.2 For G £ G, there is a constant C = G{G,p,m) such that 

'p + 26'' 



fiG{B{p,5) < C log 



P 



Before proving Proposition 16.21 we will need some notation and lemmas. We will 
say a "switch" occurs at time 1 < k < n in the sequence u;" = {uji, . . . ,ujn) S 5]" if 
uJk-i / ^k- For < J < n — 1, let 

T{j) = < : has exactly j switches >. 



Note as p G A^ \ {1, • • • ,m} at least two coordinates of p are positive. Then, as 
5 < p/2, when (l/n) Y17=ii^'^(^i)^ • • • ' Im('^j)) ^ B{p,6), at least one switch is in uj"'. 

For j > 1 and a path in T{j), let ai,...,aj denote the j switch times in the 
sequence; let also 9i, . . . , 9j^i be the j + 1 locations visited by the sequence. We now 
partition {u"' : (l/n) X^"=i(li(wi), . . . , Im(wj)) G -B(p, 6)} H T{j) into non-empty sets 
Aj{\J, V) where U = {Ui, . . . , Uj-i) and V = (Vi, . . . , V^+i) denote possible switch 
times (up to the j — 1st switch time) and visit locations respectively: 

A,(U,V) = L":^"Gr(j),if;(li(w^),...,lm(u;.))GS(p,<5), 

^ i=i 

ai = Ui,ek = Vkforl<i<j-l,l<k<j + iy 

In this decomposition, paths in ^j(U, V) are in 1 : 1 correspondence with jth switch 
times aj-the only feature allowed to vary. 
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Now, for each set Aj{JJ,\'), we define a path U, V) = (r/i, . . . ,rjn) where the 
last jth switch is "removed," 



m 



( Vi for 1 < Z < i7i 

Vk for Uk-^i <l<Uk,2<k<j -1 
Vj for Uj-i <l<n. 



Note that the sequence ry(j, U, V) belongs to T(j — 1), can be obtained no matter 
the location Vj+i (which could range on the m values in the state space), and is in 
1 : 1 correspondence with pair {Ui, . . . , Uj-i) and (Vi, . . . , Vj). In particular, recalling 
X" = {Xi, ... , Xn) denotes the coordinate sequence up to time n, we have 

Ui = riU, U, V)] < m G T{j - 1)] (6.1) 

u,v ^ / \ / 

where the sum is over all U, V corresponding to the decomposition into sets Aj(U, V) 
of {a;- : (1/n) EtM^i)^ • • • > ^m{uJi)) ^ 6)} n T{j). 

The next lemma estimates the location of the last switch time aj, and the size of 
the set Aj {\J,Y). The proof is deferred to the end. 

Lemma 6.1 OnAj{lJ,Y), we have \n{p-6) + l] < aj. Also, \Aj{\J,\')\ < [2n5+lJ. 

A consequence of these bounds on the position and cardinality of aj's associated to a 
fixed set Aj(U, V), is that 

^ k=\n{p-5)+l] ^ ^ ^ 

where refers to adding over all last switch times Uj associated to paths in ^j(U, V). 
Let now G = max{|G(i,j)| : 1 < i,j < m}. 

Lemma 6.2 For a;" G 74j(U, V) such that aj = Uj, and all large n, we have 



X- = u;-) < P^(X- = ,7(i,U,V)). (6.3) 



Proof. The path r/(j, U, V) differs from u;" only in that there is no switch at time 
Uj. Hence, 

PG(X"=^") _ G(y„F,+i) fl + GiV,+i,V,+i)/V 



PG(X- = r?(j, U, V)) Uj{l + G{V„V,)/U,) -L^^ V 1 + G{Vj, V,)/l 
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Now bounding G{Vj, Vj+i) <G,1 + G{Vj+i, Vj+i)/l < 1, 1 + G{Vj, Vj)/l > 1 - G/l, 
and noting Uj > n{p — 5) + 1 (by Lemma l6.ip . — ln(l — x) < 2x for x > small, and 
5 < p/2, give for large n, 



, 2G 

n 



i + G(y„y,)M^n^^( i + Giv-,yj)/i j-^l^^j <G{P/2)-^^. □ 

Proof of Proposition \6.2[ By decomposing over number of switches j and on the 
structure of the paths with j switches, estimates ()6.3p . ()6.2p . comment ()6.ip . and 
P^(X" G r(j - 1)) < 1, we have for ah large n, 

n-l 



^^(z„Gi?(p,5)) = j;p^(z„ei?(p,5),x"er(i)) 

j=l U,V ^ ^ 
j=l U,V ■5' ^ 

< C{G, p) log (^) E E = ^0-' U, V)) 

< mC(G, p) log (^^) E e ^(-^^ - 1)) 



< C(G, p,m) log 

V P - 

The proposition follows by taking limit on n, and weak convergence. □ 

Proof of Lemma \ 6.1[ For a path G ^j(U, V) and 1 < A; < j + 1, let be 
the number of visits to state (some r^'s may be the same if is repeated). For 
1 < i < Tfc, let and be the start and end of the ith visit to Vk- Certainly, 
T::=iIvM) = E?=M-li + ^)- Moreover, as (1/n) Er=i(li(^^)> • • • , ^ 
B{p,5), we have |(l/n.)X]r=i ^Vki^i) - PyJ < and so 

n(f>y,-5) < E(^'-^' + l) ^ ^(m+'5). (6.4) 

i=l 

Hence, as the disjoint sojourns {[nf ,^^] : 1 < ^ < Tfe} occur between times 1 and n^^, 
their total sum length is less than n:^^, and we deduce n{pv^, — S) < n^^. 

Now, for p S Am \ {1, . . . , m}, at least one of the {pvi '■ Vi ^ V^+i, 1 < ^ < j} is 
positive: Indeed, there are two coordinates of p, say ps and pt, which are positive. Say 
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Vj+1 / s; then, as (l/n)X;r=i = (1/^) Ei^ii ^ Is(t^j), 1(1/"-) XlSi ^ '^s{uJi)-Ps\ < 

S, and — > 0, the path must visit state s before time aj, e.g. = s for some 

l<i<j. 

Then, from the deduction just after (j6.4p . we have 

n{p — 5) < n max {pvi — S) < max n^. < n^. = aj — 1 

l<i<j l<i<J 

giving the first statement. 

For the second statement, note that —Errj +X^I=i i^l + 1) (with convention the 
sum vanishes when tj = 1) is independent of paths in Aj({J, V) being some combination 
of {Ui : 1 < i < j — !}■ Hence, with A; = j in (j6.4p . we observe aj = + 1 takes on 
at most [2n6 + Ij distinct values. The result now follows as paths in Aj(U, V) are in 
1 : 1 correspondence with last switch times aj. □ 

Acknowledgement. We thank M. Balazs and J. Pitman for helpful communica- 
tions. 
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